full transcript

From the Ted Talk by Rupal Patel: Synthetic voices, as unique as fingerprints

Unscramble the Blue Letters

RP: Now she's going to go on like this for about three to four hours, and the idea is not for her to say everything that the teagrt is going to want to say, but the idea is to cover all the different combinations of the sounds that ocucr in the language. The more speech you have, the better sounding voice you're going to have. Once you have those recordings, what we need to do is we have to parse these recordings into little snippets of speech, one- or two-sound combinations, sometimes even whole words that start populating a daatest or a datbasae. We're going to call this database a vcioe bank. Now the power of the voice bank is that from this voice bank, we can now say any new utterance, like, "I love chocolate" — everyone needs to be able to say that— fish through that database and find all the segments necessary to say that utertcane.

Open Cloze

RP: Now she's going to go on like this for about three to four hours, and the idea is not for her to say everything that the ______ is going to want to say, but the idea is to cover all the different combinations of the sounds that _____ in the language. The more speech you have, the better sounding voice you're going to have. Once you have those recordings, what we need to do is we have to parse these recordings into little snippets of speech, one- or two-sound combinations, sometimes even whole words that start populating a _______ or a ________. We're going to call this database a _____ bank. Now the power of the voice bank is that from this voice bank, we can now say any new utterance, like, "I love chocolate" — everyone needs to be able to say that— fish through that database and find all the segments necessary to say that _________.

Solution

  1. dataset
  2. occur
  3. voice
  4. utterance
  5. database
  6. target

Original Text

RP: Now she's going to go on like this for about three to four hours, and the idea is not for her to say everything that the target is going to want to say, but the idea is to cover all the different combinations of the sounds that occur in the language. The more speech you have, the better sounding voice you're going to have. Once you have those recordings, what we need to do is we have to parse these recordings into little snippets of speech, one- or two-sound combinations, sometimes even whole words that start populating a dataset or a database. We're going to call this database a voice bank. Now the power of the voice bank is that from this voice bank, we can now say any new utterance, like, "I love chocolate" — everyone needs to be able to say that— fish through that database and find all the segments necessary to say that utterance.

Frequently Occurring Word Combinations

ngrams of length 2

collocation frequency
unique vocal 3
grown man 2
severe speech 2
vocal identities 2
personalized voices 2
vocal identity 2
source characteristics 2
voice bank 2

ngrams of length 3

collocation frequency
unique vocal identities 2

Important Words

  1. bank
  2. call
  3. combinations
  4. cover
  5. database
  6. dataset
  7. find
  8. fish
  9. hours
  10. idea
  11. language
  12. love
  13. occur
  14. parse
  15. populating
  16. power
  17. recordings
  18. segments
  19. snippets
  20. sounding
  21. sounds
  22. speech
  23. start
  24. target
  25. utterance
  26. voice
  27. words